A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems
Identifieur interne : 001E68 ( Main/Exploration ); précédent : 001E67; suivant : 001E69A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems
Auteurs : Khaled Mostafa [Égypte] ; I. Shaheen [Égypte] ; M. Darwish [Égypte] ; Ibrahim Farag [Égypte]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 1999.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Abstract: In this paper, we propose a new approach for detecting and correcting segmentation and recognition errors in Arabic OCR systems. The approach is suitable for both typewritten and handwritten script recognition systems. Error detection is based on rules of the Arabic language and a morphology analyzer. This type of analysis has the advantage of limiting the size of the dictionary to a practical size. Thus, a complete dictionary for roots, which does not exceed 5641 roots, the morphological rules and all valid patterns can be kept in a moderate size file. Recognition channel characteristics are modeled using a set of probabilistic finite state machines. Contextual information is utilized in the form of transitional probabilities between letters of previously defined vocabulary (finite lexicon) and transitional probabilities of garbled text. The developed detection and correction modules have been incorporated as a post-processing phase in an Arabic handwritten cursive script recognition system. Experimental results show a considerable enhancement in performance.
Url:
DOI: 10.1007/978-3-540-48765-4_57
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000546
- to stream Istex, to step Curation: 000539
- to stream Istex, to step Checkpoint: 001419
- to stream Main, to step Merge: 001F77
- to stream PascalFrancis, to step Corpus: 000813
- to stream PascalFrancis, to step Curation: 000B81
- to stream PascalFrancis, to step Checkpoint: 000799
- to stream Main, to step Merge: 002180
- to stream Main, to step Curation: 001E68
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems</title>
<author><name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
</author>
<author><name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
</author>
<author><name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
</author>
<author><name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA</idno>
<date when="1999" year="1999">1999</date>
<idno type="doi">10.1007/978-3-540-48765-4_57</idno>
<idno type="url">https://api.istex.fr/document/3E3F186B8873FE74B41C4CF4826422234F1E68BA/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000546</idno>
<idno type="wicri:Area/Istex/Curation">000539</idno>
<idno type="wicri:Area/Istex/Checkpoint">001419</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Mostafa K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">001F77</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:99-0397539</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000813</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B81</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000799</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Mostafa K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">002180</idno>
<idno type="wicri:Area/Main/Curation">001E68</idno>
<idno type="wicri:Area/Main/Exploration">001E68</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems</title>
<author><name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Information Technology Department, Faculty of Computers and Information, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Computer Engineering Department, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author><name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Computer Engineering Department, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author><name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Institute of Statistical Studies and Research, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1999</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">3E3F186B8873FE74B41C4CF4826422234F1E68BA</idno>
<idno type="DOI">10.1007/978-3-540-48765-4_57</idno>
<idno type="ChapterID">57</idno>
<idno type="ChapterID">Chap57</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Arabic</term>
<term>Handwriting recognition</term>
<term>Intelligent system</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Arabe</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance écriture</term>
<term>Système intelligent</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper, we propose a new approach for detecting and correcting segmentation and recognition errors in Arabic OCR systems. The approach is suitable for both typewritten and handwritten script recognition systems. Error detection is based on rules of the Arabic language and a morphology analyzer. This type of analysis has the advantage of limiting the size of the dictionary to a practical size. Thus, a complete dictionary for roots, which does not exceed 5641 roots, the morphological rules and all valid patterns can be kept in a moderate size file. Recognition channel characteristics are modeled using a set of probabilistic finite state machines. Contextual information is utilized in the form of transitional probabilities between letters of previously defined vocabulary (finite lexicon) and transitional probabilities of garbled text. The developed detection and correction modules have been incorporated as a post-processing phase in an Arabic handwritten cursive script recognition system. Experimental results show a considerable enhancement in performance.</div>
</front>
</TEI>
<affiliations><list><country><li>Égypte</li>
</country>
</list>
<tree><country name="Égypte"><noRegion><name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
</noRegion>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001E68 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001E68 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA |texte= A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems }}
This area was generated with Dilib version V0.6.32. |